Using the AzureML R package to connect Azure ML Studio and R

This notebook demonstrates some of the capabilities of the AzureML package:

  • Read, download, and delete data in Azure ML workspace
  • Read intermediate data from Azure ML experiment
  • Allow for a concise way of publishing and consuming web services

The target audience should have a basic understanding of the Azure Machine Learning studio. Specifically, you should:

  • Know how to get the workspace ID and authorization token
    • Note that this step is not necessary in Azure ML Jupyter notebooks, because the notebook service stores your credentials in the file system
  • Understand setting up web services on Azure

If you are completely new to Azure ML, the Tutorial for Data Scientists can help you get started. All results as shown here are from my own Azure ML workspace.

Note that you don't have to specify own workspace ID and authorization token to run the code.

Load the package

The package is already in the Jupyter service on Azure ML.

Start by loading the package.


In [1]:
library(AzureML)


Warning message:
: package 'AzureML' was built under R version 3.2.3

Work with Workspace

The AzureML R package allows you to work with workspaces directly. Specifically, with this package users can read, download, and delete data in an Azure ML workspace.

Connect with AzureML workspace

Start by loading the library and setting up connection with an AzureML workspace.


In [2]:
# Connect to the workspace
ws <- workspace()

List datasets

The datasets attribute of a workspace() variable contains information about all the datasets in the workspace, including the default datasets from Microsoft.


In [3]:
# list first several datasets in my workspace
head(datasets(ws, filter = "sample")$Name)


Out[3]:
  1. "text.preprocessing.zip"
  2. "fraudTemplateUtil.zip"
  3. "testDataSource_a5e9faf817084997981d094ef040389b"
  4. "MetaAnalytics.Test.GlobalDataset.IntegerTSVFile"
  5. "MetaAnalytics.Test.GlobalDataset.IntegerCSVFile"
  6. "testDataSource_65885ed31a854d70962a02d5641d3c92"

Download a dataset

To download a dataset we can use the download.datasets() function.


In [4]:
# download datasets
movies <- download.datasets(ws, name = "Movie Ratings")
head(movies)


Out[4]:
UserIdMovieIdRatingTimestamp
1168646101381620027
21113277101379466669
3245487681394818630
4279063671389963947
5281671181379963769
62109119171391173869

In [5]:
options(repr.plot.width = 6, repr.plot.height = 4)
hist(movies$Rating, main = "Rating", xlab = NULL)


Upload a dataset

We'll use the air quality dataset that comes with base R to show how a dataset can be uploaded. Note that if dataset with the same name already exists in the workspace an error will be reported.


In [6]:
airquality[1:10,]


Out[6]:
OzoneSolar.RWindTempMonthDay
1411907.46751
23611887252
31214912.67453
41831311.56254
5NANA14.35655
628NA14.96656
7232998.66557
8199913.85958
981920.16159
10NA1948.669510

In [7]:
# uploading R data frame to Azure ML workspace
mydata <- airquality[1:10,]
# information about the uploaded dataset in the workspace will be returned
upload.dataset(mydata, ws, name = "my air quality")


Out[7]:
VisualizeEndPointSchemaEndPointSchemaStatusIdDataTypeIdNameDescriptionFamilyIdResourceUploadIdSourceOriginellip.hPromotedFromUploadedFromFilenameServiceVersionIsLatestCategoryDownloadLocationIsDeprecatedCultureBatchCreatedDateTicks
1NANANANANAPendinga2aba0dafad8436788401bbc8c22fe36.6faa388f56e6415492bec335192d2d00.v1-default-106GenericTSVmy air quality6faa388f56e6415492bec335192d2d00f5004e84bfef4733b8cdc63bf1db717eFromResourceUpload<8b>NA0TRUENAhttps://esintussouthsus.blob.core.windows.net/uploadedresources/C7FDC_a2aba0dafad8436788401bbc8c22fe36_f5004e84bfef4733b8cdc63bf1db717e.tsv?sv=2015-02-21&sr=b&sig=tkHy%2BvjbJ5dDQgCVY%2Fv6b9mz%2BgwMOK5GPFbq79U%2Bhsk%3D&st=2016-03-13T18%3A21%3A10Z&se=2016-03-14T18%3A26%3A10Z&sp=r&rscd=attachment%3B%20filename%3D%22my%20air%20quality.tsv%22FALSEdefault1066.359349e+17

In [8]:
# download to check its content
head(download.datasets(ws, name = "my air quality"))


Out[8]:
OzoneSolar.RWindTempMonthDay
1411907.46751
23611887252
31214912.67453
41831311.56254
5NANA14.35655
628NA14.96656

Delete a dataset

If the delete action is successful the returned status value for Deleted should be TRUE.


In [9]:
# delete dataset
delete.datasets(ws, name = "my air quality")


Out[9]:
NameDeletedstatus_code
1my air qualityTRUE204

The "Airport Codes Dataset" is one of the dafault datasets in Azure ML. This example shows that the default datasets cannot be deleted.


In [10]:
# delete Azure sample dataset: not allowed
# Uncomment the following line to see the failure report

# delete.datasets(ws, name = "Airport Codes Dataset")

Work with experiments

The AzureML package allows you to get a summary of the existing experiments and to download the intermediate datasets.

List existing experiments

Information for all experiments in the workspace, including the default ones from Microsoft, can be returned.


In [11]:
# experiments
exps <- experiments(ws)
head(
    with(exps, data.frame(Description, ExperimentId, Creator, stringsAsFactors = FALSE))
    )
#head(cbind(Description = exps$Description, ExperimentId = exps$ExperimentId, Creator = exps$Creator))


Out[11]:
DescriptionExperimentIdCreator
1Simple convert to TSVa2aba0dafad8436788401bbc8c22fe36.f-id.2237184cf35e42c4a41c0307f623a3f3adevries
2Experiment created on ‎2‎/‎4‎/‎2016a2aba0dafad8436788401bbc8c22fe36.f-id.26730c0bfd0948afb8eb8b405bd264baadevries
3Experiment created on ‎12‎/‎18‎/‎2015a2aba0dafad8436788401bbc8c22fe36.f-id.3292fcc8bcd14c29b75242f384190a41adevries
4Experiment created on ‎2‎/‎9‎/‎2016a2aba0dafad8436788401bbc8c22fe36.f-id.63199eaa2e8e4da1beb4105b79b74988adevries
5Experiment created on ‎2‎/‎23‎/‎2016a2aba0dafad8436788401bbc8c22fe36.f-id.9c98fd354ff2471bb6d3d5798ce35d16adevries
6Simple experiment to test Jupytera2aba0dafad8436788401bbc8c22fe36.f-id.a0fbd6ccad204e65825b5b7e574c9be8adevries

You can also filter by using the experiments() function with the "filter" argument.


In [12]:
# check sample experiments
e <- experiments(ws, filter = "samples")
head(e$Creator)
head(cbind(e$Description, e$ExperimentId))


Out[12]:
  1. "Microsoft Corporation"
  2. "Microsoft Corporation"
  3. "Microsoft Corporation"
  4. "Microsoft Corporation"
  5. "Microsoft Corporation"
  6. "Microsoft Corporation"
Out[12]:
Binary Classification: Network intrusion detection 506153734175476c4f62416c57734963.f-id.02556bd24faf4f099dc88c1d21262f36
Text Classification: Step 2 of 5, text preprocessing 506153734175476c4f62416c57734963.f-id.07811aadbdca4669a2d129abb083d2f0
Learning with Counts: Binary Classification 506153734175476c4f62416c57734963.f-id.08b6ef9574de46e7a9fdc74dc5221fb2
Clustering: Color quantization 506153734175476c4f62416c57734963.f-id.0a4458138dc140329a934393d7bbf2ed
Retail Forecasting: Step 4 of 6, train regression models 506153734175476c4f62416c57734963.f-id.0ec7e45f59244b63a994b86ae81d7f93
Binary Classification: Direct marketing 506153734175476c4f62416c57734963.f-id.11e8958b4571436d8848f2a75b348fdc

Download intermediate data

You can also download intermediate data from an experiment. To do this you need information for four variables:

  • experiment
  • node_id
  • port_name
  • data_type_id.

To obtain this information, follow these simple steps

  1. Click the output port of the "Convert to CSV" module in an experiment you get: Figure 1
  2. Next, click "Generate Data Access Code...", to get the information in Figure 2
  3. Copy and paste the code from the "R" tab to your R session

Step 1: Click on Convert to CSV module output

Step 2: Click Generate data access code, and "R" tab

Copying the information to your local R session

You can copy and paste the code in the "R" tab to a local R session. When you have evaluate the code, the data gets downloaded from the Azure ML Studio to your local session.


In [13]:
# download intermediate data

# Replace the code below with the snipped provided by the AzureML Studio
#exp_data <- download.intermediate.dataset(ws = ws, 
#            experiment  = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
#            node_id = "xxxxxxxx-xxx-xxx-xxxx-xxxxxxxxxxxx-xxx",
#            port_name = "Results dataset",
#            data_type_id = "GenericCSV")

#head(exp_data)

A concise way of consuming web service

The AzureMLpackage also allows a very concise way of consuming the web service. All you need is to provide the web service ID and the workspace information. Then use consume() to consume the service from any R terminal (as long as you have internet access).

For illustration purpose, fit a linear model and deploy a web service based on the model.

If you encounter the error Requires external zip utility. Please install zip, ensure it's on your path and try again while running this on Windows, you can install RTools and add the install directory to the system path. For example, if it's installed in C:\Tools, you should add C:\Tools\bin to your system path and then restart R.

Publishing the web service


In [14]:
# load the package
library(MASS)

# fit a model using all variables except medv as predictors
lm1 <- lm(medv ~ ., data = Boston)

# define predict function
mypredict <- function(newdata){
  predict(lm1, newdata)
}

# test the prediction function
newdata <- Boston[1:10, ]

# Publish the service
ep <- publishWebService(ws = ws, 
                        fun = mypredict, 
                        name = "HousePricePrediction", 
                        inputSchema = newdata)
str(ep)


Warning message:
: package 'MASS' was built under R version 3.2.3
Classes 'Endpoint' and 'data.frame':	1 obs. of  14 variables:
 $ Name                 : chr "default"
 $ Description          : chr ""
 $ CreationTime         : chr "2016-03-13T18:26:32.567Z"
 $ WorkspaceId          : chr "a2aba0dafad8436788401bbc8c22fe36"
 $ WebServiceId         : chr "126375f0e94911e58e50518e3763ef03"
 $ HelpLocation         : chr "https://studio.azureml-int.net/apihelp/workspaces/a2aba0dafad8436788401bbc8c22fe36/webservices/126375f0e94911e58e50518e3763ef03"| __truncated__
 $ PrimaryKey           : chr "rhEGPgds8G+Hptp8OC0sYFH7CqyyQx8j3nG1Tj6kocWKBaoa52rr4zJ+zAwj7DeCwMiAiLydGZXfPlcIF4uv2A=="
 $ SecondaryKey         : chr "lth3osRs3BjpoWrEn7urQ5F0Lxd4qnOl8TdUHw58PuBWImpC+DiZm1Ua9LoFS+u9ZsoyVyefM6dlO/BFcRMTBQ=="
 $ ApiLocation          : chr "https://ussouthcentral.services.azureml-int.net/workspaces/a2aba0dafad8436788401bbc8c22fe36/services/a150c8f086974477a6d9d63e12"| __truncated__
 $ PreventUpdate        : logi FALSE
 $ GlobalParameters     :List of 1
  ..$ : list()
 $ MaxConcurrentCalls   : int 4
 $ DiagnosticsTraceLevel: chr "None"
 $ ThrottleLevel        : chr "Low"

Consuming the web service

Now you are ready to consume the web service.


In [15]:
# consume
consume(ep, newdata)


Request failed with status 401. Waiting 1.7 seconds before retry
..
Out[15]:
ans
130.00384
225.02556
330.5676
428.60704
527.94352
625.25628
723.00181
819.53599
911.52364
1018.92026

Additional resources

The AzureML package also has a vignette Getting Started with the AzureML Package that covers a wider range of examples.

You can also take a look at the help for ?publishWebservice


Created by a Microsoft Employee.
Copyright © Microsoft. All Rights Reserved.